NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Fast kernel-based association testing of non-linear genetic effects for biobank-scale data

https://doi.org/10.1038/s41467-023-40346-2

Fu, Boyang; Pazokitoroudi, Ali; Sudarshan, Mukund; Liu, Zhengtong; Subramanian, Lakshminarayanan; Sankararaman, Sriram (December 2023, Nature Communications)

Abstract Our knowledge of non-linear genetic effects on complex traits remains limited, in part, due to the modest power to detect such effects. While kernel-based tests offer a versatile approach to test for non-linear relationships between sets of genetic variants and traits, current approaches cannot be applied to Biobank-scale datasets containing hundreds of thousands of individuals. We propose, FastKAST, a kernel-based approach that can test for non-linear effects of a set of variants on a quantitative trait. FastKAST provides calibrated hypothesis tests while enabling analysis of Biobank-scale datasets with hundreds of thousands of unrelated individuals from a homogeneous population. We apply FastKAST to 53 quantitative traits measured across ≈ 300 K unrelated white British individuals in the UK Biobank to detect sets of variants with non-linear effects at genome-wide significance.
more » « less
Full Text Available
Modeling fine-grained spatio-temporal pollution maps with low-cost sensors

https://doi.org/10.1038/s41612-022-00293-z

Iyer, Shiva R.; Balashankar, Ananth; Aeberhard, William H.; Bhattacharyya, Sujoy; Rusconi, Giuditta; Jose, Lejo; Soans, Nita; Sudarshan, Anant; Pande, Rohini; Subramanian, Lakshminarayanan (October 2022, npj Climate and Atmospheric Science)

Abstract The use of air quality monitoring networks to inform urban policies is critical especially where urban populations are exposed to unprecedented levels of air pollution. High costs, however, limit city governments’ ability to deploy reference grade air quality monitors at scale; for instance, only 33 reference grade monitors are available for the entire territory of Delhi, India, spanning 1500 sq km with 15 million residents. In this paper, we describe a high-precision spatio-temporal prediction model that can be used to derive fine-grained pollution maps. We utilize two years of data from a low-cost monitoring network of 28 custom-designed low-cost portable air quality sensors covering a dense region of Delhi. The model uses a combination of message-passing recurrent neural networks combined with conventional spatio-temporal geostatistics models to achieve high predictive accuracy in the face of high data variability and intermittent data availability from low-cost sensors (due to sensor faults, network, and power issues). Using data from reference grade monitors for validation, our spatio-temporal pollution model can make predictions within 1-hour time-windows at 9.4, 10.5, and 9.6% Mean Absolute Percentage Error (MAPE) over our low-cost monitors, reference grade monitors, and the combined monitoring network respectively. These accurate fine-grained pollution sensing maps provide a way forward to build citizen-driven low-cost monitoring systems that detect hazardous urban air quality at fine-grained granularities.
more » « less
VACCINE: Using Contextual Integrity For Data Leakage Detection

https://doi.org/10.1145/3308558.3313655

Shvartzshnaider, Yan; Pavlinovic, Zvonimir; Balashankar, Ananth; Wies, Thomas; Subramanian, Lakshminarayanan; Nissenbaum, Helen; Mittal, Prateek (May 2019, Proceedings of the 2019 World Wide Web Conference)

Modern enterprises rely on Data Leakage Prevention (DLP) systems to enforce privacy policies that prevent unintentional flow of sensitive information to unauthorized entities. However, these systems operate based on rule sets that are limited to syntactic analysis and therefore completely ignore the semantic relationships between participants involved in the information exchanges. For similar reasons, these systems cannot enforce complex privacy policies that require temporal reasoning about events that have previously occurred. To address these limitations, we advocate a new design methodology for DLP systems centered on the notion of Contextual Integrity (CI).We use the CI framework to abstract real-world communication exchanges into formally defined information flows where privacy policies describe sequences of admissible flows. CI allows us to decouple (1) the syntactic extraction of flows from information exchanges, and (2) the enforcement of privacy policies on these flows. We applied this approach to built VACCINE, a DLP auditing system for emails. VACCINE uses state-of-the-art techniques in natural language processing to extract flows from email text. It also provides a declarative language for describing privacy policies. These policies are automatically compiled to operational rules that the system uses for detecting data leakages. We evaluated VACCINE on the Enron email corpus and show that it improves over the state of the art both in terms of the expressivity of the policies that DLP systems can enforce as well as its precision in detecting data leakages.
more » « less
Full Text Available

Search for: All records